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(54) Abstract Title 

Debug coprocessor for data processing systems 

(57) A coprocessor 6 is coupled to a main processor 4 via a coprocessor interface CP and is responsive to 
coprocessor instructions MCR, MRC within the stream of instructions to perform coprocessor operations. The 
coprocessor 6 is a debug coprocessor operable to at least partially control generation of diagnostic data for 
debugging the main processor where the coprocessor instructions are debug coprocessor Instructions that 
control operation of said debug coprocessor. Using a debug mechanism in the form of a debug coprocessor 
reduces the impact of the debug mechanism upon normal operation. The main processor also has separate 
scan chains for instruction transfer and data transfer to/from debug logic. The main processor 4 is driven by a 
main processor clocic signal elk and at least a portion of the debug logic (JTAG 12) Is driven by a debug clock 
signal tck. 
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DEBUG MECHANISM FOR DATA PROCESSING SYSTEMS 

This invention relates to data processing systems. More particularly, this 
invention relates to debugging mechanisms for data processing systems. 

5 

A problem with debugging mechanisms is that they should not interfere with 
or limit the performance possible during nomial operation. As an example, iaiown 
debugging mechanisms often involve the addition of multiplexers within the data 
processing paths to allow configuration of the debugging mechanisms, such as 
1 0 programming breakpoints and watchpoints. These additional circuit elements that are 
only needed for debug purposes can impose signal propagation delays within critical 
data paths that limit the maximum performance of the data processing system during 
normal operation. 

15 Viewed from one aspect the present invention provides an apparatus for 

processing data, said apparatus comprising: 

a main processor responsive to main processor instructions within a stream of 
instructions input to said main processor to perform main processor operations; 

a coprocessor coupled to said main processor via a coprocessor interface and 
20 responsive to coprocessor instructions within said stream of instructions to perform 
coprocessor operations; wherein 

said coprocessor is a debug coprocessor operable to at least partially control 
generation of diagnostic data for debugging said apparatus for processing data and 
said coprocessor instructions are debug coprocessor instructions that control operation 
25 of said debug coprocessor. 

The invention recognises that the mechanisms and structures normally used for 
coprocessors can be used to provide a debugging system that has a reduced impact 
upon the normal operation of the system. Additionally, tiie main processor is often 
30 already designed in a manner to facilitate operating and conmiunicating with 

coprocessors in a manner that does not restrict the performance of the main processor. 
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The invention exploits this feature by providing a debugging mechanism in the forai 
of a debug coprocessor. This debug coprocessor can be configured via the 
coprocessor interface in a manner that has little impact upon the nomial performance 
of the main processor. 

5 

A particularly effectively way of configuring the debug coprocessor is via one 
or more debug coprocessor registers. 

The coprocessor instruction sets associated with main processor and 
10 coprocessor systems typically include coprocessor instructions that write values to 
registers within a coprocessor or read values from registers within a coprocessor. In 
this way, configuration data can be written to a debug coprocessor and diagnostic data 
recovered firom a debug coprocessor. 

1 S Highly usefiil debug mechanisms are those capable of performing breakpoint 

and watchpoint functions. These breakpoint and watchpoint values need to be 
programmed and stored. This need can be achieved highly effectively by the use of 
registers within the debug coprocessor to store the desired breakpoint and watchpoint 
values. 

20 

Control data associated with more sophisticated breakpoint and watchpoint 
operation, such as mask values, enable bits, and mode selection values, may also be 
efficiently programmed and stored using registers within the debug coprocessor. 
Accordingly, comparisons against address attributes such as the size of the transfer, 
25 the mode (e.g. priveleged/user), an instruction set indicating bit (the ARM Thumb T- 
bit), etc, may also be made. 

The coprocessor registers may be accessed via coprocessor instructions within 
the instruction stream passed to the main processor. This enables both software 
30 running on the data processing system being debugged and an external scanning 
mechanism to access the coprocessor registers to configure the debugging operation 
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by issuing identical instructions to the main processor. Thus, software running on the 
main processor core may feed instructions intended for the debug coprocessor into a 
main processor pipeline in a normal fashion, whereas a scanning mechanism can scan 
in instructions intended for the debug coprocessor one at a time through an instruction 
5 transfer register. These scanned-in instructioiis then being executed one at a time at 
full speed by being issued as mstructions into the same mam processor pipeline. 

Alternatively and/or additionally, at least some debug coprocessor registers 
may be accessed via a serial scan chain operating under control of a scan chain 
1 0 controller. This allows external programming of the debug mechanism in the form of 
the debug coprocessor to be achieved by external hardware and software. 

The registers accessible via scan chain mechanisms preferably include a 
register for allowing instructions to be serially scanned into the system and then be 
1 5 executed by either the main processor or the debug coprocessor (or any other 
coprocessor, such as a floating point unit coprocessor, attached to the coprocessor 
mterftice). 

In a similar manner it is preferable that the registers accessible via scan chain 
20 mechanisms include a register for allowing a data value to be serially scanned into the 
system or out from the system. Applying and/or recovering data values in this way is 
highly usefiil in diagnostic operation. 

In order to deal with the potential problems of the main processor or a 
25 coprocessor trying to access such a data value register at the same time that it was 
being accessed by the scan chain, preferred embodiments provide such a data value 
register in the form of two data value registers, one of these being writable by the 
main processor or a coprocessor and readable by a scan chain and the other of these 
being readable by the main processor or a coprocessor and writable by a scan chain. 
30 This eflFectively forms a bidirectional conununications channel that avoids potential 
data conflicts. 
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Another of the registers within the debug coprocessor provided by preferred 
embodiments is a debug status control register that can be read from and written to 
and that stores information such as the entry condition into the debug mode, a debug 
5 enable bit and flags controlling main processor vector instruction trap operation. 

The debug coprocessor also preferably is able to use and to generate control 
signals that can be passed to the rest of the system to perform functions such as 
pipeline drains, pipeline holds and instruction cancellation. 

0 

The debug coprocessor will typically operate at the same clock frequency as 
the main processor and any other coprocessors in the system. This facilitates the 
interactions between these elements using a standard coprocessor interface such that 
the debug coprocessor has a reduced impact upon the speed of normal operation of the 

5 main processor and any other coprocessors. However, the scan chain mechanisms 
will typically operate at a different, typically asynchronous, clock speed and so the 
debug coprocessor needs to include circuits that allow these elements within different 
clock domains to communicate. These extra mechanisms may be isolated within the 
debug coprocessor in a maimer that avoids interference with the normal operation of 

3 the other main circuit elements. 

Viewed from another aspect the present invention provides a method of 
processing data, said method comprising the steps of: 

in response to main processor instructions within a stream of instructions input 
5 to a main processor performing main processor operations; 

in response to coprocessor instructions within said stream of instructions 
controlling a coprocessor coupled to said main processor via a coprocessor mter&ce 
and to perform coprocessor operations; wherein 

said coprocessor is a debug coprocessor operable to at least partially control 
) generation of diagnostic data for debugging said apparatus for processing data and 
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said coprocessor instructions are debug coprocessor instructions that control operation 
of said debug coprocessor. 
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The invention will now be described, by way of example only, with reference 
to the accompanying drawings in which: 

Figure 1 schematically illustrates a data processing system including a debug 
5 mechanism; 

Figure 2 schematically illustrates a debug coprocessor that is part of the debug 
mechanism of the data processing system of Figure 1; 

Figure 3 schematically Ulustrates a data transfer register that is part of the 
debug mechanism of the data processing system of Figure 1; 
1 0 Figure 4 illustrates execution of a debug instruction; 

Figure 5 illustrates repeated execution of a debug instruction; 
Figure 6 shows an alternative view of the debug coprocessor of Figure 2; 
Figure 7 illustrates a scan chain controller that is part of the debug mechanism 
of the data processing system of Figure I; 
15 Figure 8 illustrates the timing of part of the handshaking control of the data 

transfer register of Figure 3; and 

Figure 9 illustrates the timing of handshaking signals between the debug 
system and the main processor that control the switching between the hardware debug 
mode and the normal mode. 

20 

Figure 1 shows a data processing system 2 that includes a main processor 4, a 
debug coprocessor (and system coprocessor) 6 and a floating point unit coprocessor 8. 
The main processor 2 is coupled via a coprocessor interface in the form of a 
coprocessor bus CP to the debug coprocessor 6 and the floating point unit coprocessor 
25 8. The form of this coprocessor bus CP is substantially the same as a standard 
coprocessor bus, such as the ARM coprocessor bus (as used with microprocessors 
produced by ARM Limited of Cambridge, England). 

The main processor 4 is coupled to a data bus DB and via a prefetch unit 10 to 
30 an instmction bus EB. The data bus DB and die instruction bus IB both include 
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address portions that are monitored by the debug coprocessor 6 to identify breakpoints 
and watchpoints respecdvely. 

The main processor 4, the debug coprocessor 6 and the floating point unit 
5 coprocessor 8 are ail driven by a conmion main processor clock signal elk at a main 
processor clock firequency. A scan chain controller 12 also forms part of the debug 
logic together with the debug coprocessor. The scan chain controller 12 is driven by a 
debug clock signal tck that has a typically different frequency to the main processor 
clock signal elk and is asynchronous with the main processor clock signal elk. 

10 

A data transfer register DTR and an instruction transfer register ITR are 
coupled to the scan chain controller 12 such that they may both the written to and read 
from via separate serial scan chains. Whilst not illustrated, in some modes the 
separate scan chains of the data transfer register DTR and the instruction transfer 

1 5 register ITR may be joined together to form a single scan chain. The scan chain 
controller 12 is of the type specified in the DEEE 1 149.1 JTAG standard. The system 
is controlled such that whenever the scan chain controller 12 passes through the Run- 
Test/Idle state wilhin the TAP controller states, the instruction transfer register ITR 
issues its contents to the prefetch unit 10 as an instruction to be passed to the pipeline 

20 14 of the main processor 4 to be executed by asserting a valid instruction line. It will 
be appreciated that the instruction passed from the prefetch unit 10 to the main 
processor 4 may be an instraction that is executed by the main processor 4 itself or 
one that is intended for one of the coprocessors 6, 8 coupled via the coprocessor bus 
CP to the main processor 4. 

25 

Two public instructions HALT and RESTART have been added to the normal 
JTAG instruction set to halt and restart the main processor's normal operations as will 
be discussed further below. 

30 Figure 2 illustrates the debug coprocessor 6 in more detail. Coprocessor 

instructions intended to be executed by a coprocessor rather than the main processor 4 
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include a coprocessor number operand field. In this way only the coprocessor with the 
matching coprocessor number will respond to the coprocessor instruction and execute 
it. In the present case, the debug coprocessor is given the number CP 14. The system 
coprocessor normally associated with an ARM main processor is given the 
5 coprocessor nimiber CP 15 and contains various control registers. Figure 2 illustrates 
the debug coprocessor CP 14. 

The debug coprocessor 6 includes a bank of watchpoint registers 16 and a 
bank of breakpoint registers 18. These registers 16, 18 within the debug coprocessor 

10 6 respectively store watchpoint addresses and breakpoint addresses. Associated with 
each watchpoint register 16 and breakpoint register 18 is a respective control register 
20, 22. These control registers 20, 22 store control data such as mask values to be 
applied to the comparisons of the instruction bus IB and the data bus DB to the 
respective breakpomt and watchpoint values. The control registers can also store a 

1 5 flag to enable or disabled their associated breakpoint or watchpoint as well as a value 
indicating in which modes of operation that breakpoint or watchpoint is active. 

The watchpoint values from the watchpoint registers 16 are compared by 
respective watchpoint comparators 24 with data addresses on the data bus DB whilst 

20 breakpoint values from the breakpoint registers 1 8 are compared by respective 
breakpoint comparators 26 with instruction addresses on the instruction bus IB. If a 
watchpoint match or a breakpoint match is identified, then this is indicated to the rest 
of tiie circuit by a watchpoint hit signal WPH or a breakpoint hit signal BPH. A 
debug enable signal DE is also generated. The watchpoint hit signal WPH. the 

25 breakpoint hit signal BPH and the debug enable signal DE can be considered part of 
the interface which exists between the debug coprocessor 6 and the main processor 4, 
and may uiclude additional signals to indicate main processor status, to indicate tiie 
core should halt or to indicate that the core should restart 

30 The debug coprocessor 6 also includes a debug status control register DSCR 

that is coupled to the coprocessor bus CP- All of the registers 16, 18, 20, 22, 28 
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discussed above can be both written to and read &om by coprocessor instructions such 
as ARM MCR and MRC instructions. This enables the debug coprocessor to be 
configured via the coprocessor inter&ce without having to add additional circuitry 
into the main processor 4, with its associated potential for slowing critical paths. 

5 

Figure 3 illustrates the data transfer register DTR in more detail. In particular, 
the data transfer register DTR is fornied of two registers 30, 32. The first register 30 
is writable via the scan chain input tdi and readable by the main processor 4. The 
second register 32 is readable via the scan chain output tdo and is writable by the main 
10 processor 4. Together the first register 30 and the second register 32 form a 
bidirectional commimications channel in which conflicting writes cannot occur. 

The second register 32 stores thirty two bits of data and has a 33rd bit foraied 
by the multiplexer 34. The multiplexer 34 operates under control of a controller 36 to 

1 5 select one of four inputs. The first input is the output from the main portion of the 
second register 32 and this is selected when serially scanning out the contents of the 
main portion of the second register 32. The other fliree inputs to the multiplexer 34 
handle handshaking between the portion of the system operating in the main processor 
clock signal elk domain and the portion of the system operating in the debug clock 

20 signal tck domaiiL In particular, these signals are arranged such that they may be read 
by external debug circuitry via the tdo output to prevent the data transfer register from 
being read to external debug circuitry before it is fiill and similarly to prevent the data 
transfer register from being read by the main processor 4 until it has been properly 
loaded. Furthermore, the external debug circuitry can read the 33rd bit to determine 

25 whetiier data that it has placed within the data transfer register DTR has-been used by 
the main processor 4 before a further attempt is made to load data into the data 
transfer register DTR. The same is true of preventing tixe main processor 4 trying to 
load data into the data transfer register DTR before it has been collected by the 
external debug circuitry. A PipeEmpty signal may also be selected by the multiplexer 

30 34 as the 33rd bit pollable by the debug system to determine tiiat tiie instruction 
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pipeline 14 of the main processor 4 is empty with no pending instructions to be 
executed. 

In operation, as data is being serially scanned into the first register 30 on line 
5 tdi, data bits from the second register 32 are scanned out on line tdo. The converse is 
also true in that as data is being serially scanned out of the second register 32 on line 
tdo, data bits are input to the first register 30 on line tdi. Thus, data may be 
simultaneously recovered from and written to the data transfer register DTR thereby 
increasing the speed of debug operations. 

10 

Figure 4 illustrates execution of a debug instruction supplied to the main 
processor 4 by the scan chain controller 12. The main processor 4 continues to be 
driven by the main processor clock signal elk with the scan chain controller being 
driven by the debug clock signal tcL At step 38 a debug trigger occurs, such as the 

1 S execution of a brealqjoint mstruction, an externally applied breakpoint signal or 
detection of a breakpoint or watchpoint trigger. The debug trigger causes the main 
processor 4 to suspend normal processing at step 40. The pipeline 14 then drains of 
existing instructions during step 42 prior to executing a sequence of no-operation 
instructions in the loop formed of steps 44, 46. Step 46 checks whether the scan chain 

20 controller 1 2 has indicated (in this embodiment as a result of passing tiarough the Run- 
Test/Idle state) that the contents of the instraction transfer register ITR should be 
issued as an instruction into the pipeline 14. 

Prior to step 46 being passed, the debug side of the system, which includes die 
25 scan chain system, operating under control of the debug clock signal tck serves to first 
scan in an instruction into the instruction transfer register ITR at step 48. When the 
instruction has been scanned in, the scan chain controller is moved through the Run- 
Test/Idle state at step SO which indicates to the main processor side of the system tiiat 
an instruction is ready in the instruction transfer register ITR to be issued down the 
30 pipeline 14. 
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When step 50 has completed, step 52 serves to transfer the instruction from the 
instruction transfer register fllR. mto the pipeline 14 from where it is executed at step 
54. When the instruction has completed, a debug instruction done signal is passed 
back to the debug side (step 56) so that the debug side knows that the mstruction it 
5 placed in the instruction transfer register ITR has been completed and that further 
instructions can be placed into the instmction transfer register ITR if desired. The 
debug side notes the issue of the debug instmction done signal at step 58 and at step 
60 clears the signal it asserted to indicate that there is a valid mstruction waiting to be 
executed within the instmction transfer register ITR. The main processor side notes 
1 0 the clearing of the signal by the debug side at step 62 and then returns processing to 
the loop 44, 46. The steps 56, 58, 60, 62 effectively perform a handshaking operation 
between the portions operating within different clock domains thereby 
accommodating the frequency difference and the unknown phase relationship. 

1 5 Thus, it will be seen that the debug portion of the circuit operates with a debug 

clock signal tck whilst the main processor portion of the circuit operates with the main 
processor clock signal elk and copes with the different clock rates by executing no- 
operation instructions and by the handshaking processes. 

20 Figure 5 illustrates the multiple execution of a debug instruction. At step 64 

an instmction such as an ARM STC instmction is scanned into the instmction transfer 
register ITR. At step 66, a data value is scanned into the data transfer register DTR. 
At step 68, the scan chain controller 12 is moved through the Run-Test/Idle state and 
the debug instruction issue signal is set. The main processor side detects the setting of 

25 the debug instmction issue signal and at step 70 executes the instmction stored in the 
instmction transfer register ITR (see Figure 4) by moving the data value from the data 
transfer register DTR to a memory (memory circuit not shown). When this instmction 
has completed, this is indicated back to the debug side which scans in a fiirther data 
value to the data transfer register at step 72. The instmction stored within the 

30 instruction transfer register ITR does not need to be changed by the external system 
and so no time is consumed in transferring in another instmction. When the next data 
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value is in place, the scan chain controller 12 is moved through the Run-Test/IcUe state 
at step 74 and the processes continues. Steps identical to steps 68 and 70 can be 
performed a large number of tunes to efSciently perform a block memory transfer. 

5 Figure 6 illustrates in an alternative way a portion of the debug coprocessor 6. 

A coprocessor instruction is passed to the debug coprocessor 6 via the bus CPinstr 
from where it is regsitered into an instruction register 76. The latched instruction is 
passed to a decoder 78 which in turn generates control signals for the various registers 
of the debug coprocessor 6 and the output multiplexer 80. Register writes are made in 
10 response to ARM MCR instructions and register reads are made in response to ARM 
MRC instructions. When a register write is being performed, the register enable 
signal of the appropriate register is asserted. When a register read is being performed, 
then the appropriate register output is enabled and the multiplexer 80 switched to 
select this output. 

15 

The registers 82 are system coprocessor CP 15 registers that can be used by the 
external debug circuitry to perform cache and MMU operations during debug. The 
debug status control register DSCR 28 is treated as coprocessor 14 register Rl and can 
be both written to and read from. The instruction transfer register ITR may only be 
20 written by the scan chain as shown in Figure 1 . The data transfer register DTR can be 
both written to and read from. The data transfer register is treated as coprocessor 
CP14 register R5. 

One each of the breakpoint registers 18 and the watchpoint registers 16 are 
25 illustrated. These are associated with respective control registers 22, 20. 

Comparators 24, 26 are associated with these registers and serve to generate the debug 
event signal DE. 

Figure 7 schematically illustrates a portion of the scan chaui system, A TAP 
30 controller 84 in accordance with usual JTAG operation is provided. The external 
serial data input and output are provided by the lines tdi and tdo. A JTAG instruction 
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may be registered within instruction register IR 86. A particular scan chain can be 
selected using the scan chain select register 92 operating m conjunction with the 
multiplexers 90, 88. The two portions of the data transfer register 30, 32 together 
constitute a single scan chain. The instruction transfer register ITR and the debug 
5 status and control register DSCR 28 also form separate scan chains. A read only 
IDCode scan chain and a Bypass scan chain are also, provided. 

Figure 8 illustrates, by way of example, the timing of the signals passed 
between the scan chain controller 12 and the main processor 4 as part of the 

1 0 handshaking involved in control of the second register wDTR 32 that is part of the 
data transfer register DTR. In Figure 8 the set term is controlled by the main 
processor 4 operating in the main processor clock signal domain and the clear term is 
controlled by the scan chain controller 12 operating in the debug clock signal domain. 
In more detail, the set term and the clear term are respectively controlled by their own 

1 5 four-state state machines (the different states of which are indicated by the two digit 
binary nxmibers in Figure 8). 

In operation the main processor 4 first writes to the wDTR register 32, moving 
the set term from low to high and causing wDTRFull to be asserted (the wDTRFull 
20 signal nuiy at this point be selected as the 33rd bit of the DTR register by multiplexer 
34). This particular version of the wDTRFull indicator exists only in the tck domain. 
The external debug circuitry is polling the 33rd bit of the DTR register watching for 
this transition to indicate that there is data to collect. An indicator also exists for use 
by the main processor 4 in its elk domain. 

25 

The debug side then reads out the wDTR register 32 and yAicn this is finished 
transitions the clear term from low to high to indicate to the set term state machine 
that the data has been read. The transition in the clear term is passed via a 
synchroniser to the set term state machine which responds by transitioning the set 
30 term from high to low thereby indicating to the clear term state machine that the 
change in the clear term has been noted. The transition in the set term from high to 
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low is passed by a synchroniser back to the clear temi state machine that moves the 
clear term from high to low as the information that the wDTR register 32 has been 
read has been acknowledged. The transition of the clear signal from high to low is 
passed by a final synchroniser from the clear temi state machine to the set term state 
5 machine to reset the set term state machine such that the system is ready for the main 
processor 4 to write another value to the wDTR register 32 if desired. 

The rDTRFull indicator shown in Figure 3 relates to the rDTR register 30 and 
may be created in a similar manner, with the tck and elk signals exchanged and with 
1 0 reads and writes exchanged. 

Figure 9 illustrates the timing of handshaking signals passed between the main 
processor 4 and the debug system when restarting (in response to a debug RESTART 
instruction) the main processor 4 from the debug state in which it has been operating 
IS as shown in Figures 4 and 5. 

The RestartCore signal exists in the tck domain and the DbglnDebug and 
CorelnDebug signals exist in the elk domain. As the first step in the restarting 
operation, the debug control logic responds to a RESTART signal by setting a core 

20 restarted bit to 0 and moving a RestartCore signal from low to high. This transition in 
RestartCore is passed via a synchroniser to the main processor 4 that responds by 
transitioning the DbglnDebug signal from high to low. This bit in turn is passed 
through a synchroniser back to the debug control logic where it deasserts the 
RestartCore signal and asserts the CoreRestarted signal. The core restarted bit must 

25 be polled by the external debug circuitry to ensure that the core has restarted. The 
transition in the RestartCore signal is passed via a synchroniser to the main processor 
4 which responds by moving the CorehiDebug signal from high to low to transition 
the main processor 4 from debug state to normal state. 



30 



Figures 8 and 9 give two example of handshaking control signals passed 
between the main processor side operating in the main processor clock signal domain 
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and the debug side operating in the debi% clock signal domain. Handshaking between 
further control signals can be handled in a similar manner. 

A futher detailed view of the debug mechanisms described above is also 
5 presented in the following Microarchitecture Specification in relation to the ARMl 0 
microprocessor and the ARM microprocessor architecture in general: 

I.O Basic Intent 

10 1) To provide a future proof debug interface to ARM processors that will extend 
across many implementations 

2) To provide access to hardware debug facilities from both JTAG hardware and 
target based software 

IS 2.0 Background 

The debug logic in ARMIO provides the user with the ability to support functions 
similar to those found in in-circuit emulators, i.e., the ability to set up breakpoints and 
watchpoints, to inspect and modify the processor and system state, and to see a trace of 
20 processor activity around points of interest NomMlly, software on a host computer acts 
as an interface to the debugger, removing the end user completely firom the pardcuiar 
protocols that the on-chip logic requires. 

ARMIO is an opportunity to clean up the debug interface presented by ARM7 and 
25 ARM9; that is to kill off unused functionality and add new functionality. Because of the 
complexities of ARMlO's pipeline, the ARM7 and ARM9 method of jamming 
instructions into the pipeline has two costs: it makes ICEman software more complex, 
and it has hardware critical path implications that ARMIO must avoid. 

30 There are no mask registers, chain, or range functions in ARMIO. There are also no 
data dependent watchpoints or breakpoints. However, ARMIO does implement 
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breakpoint instructions to replace the most common use for data dependency, which is 
to recognize a user-specified instruction for breakpoints in an instruction stream. We 
have dropped the mask registers in favor of more actual breakpoint registers. 

5 3.0 Using ARMIO Debug 

Debug on ARMIO and future machines (hereafter just called ARMIO debug) is centered 
around coprocessor 14 (CP14). All programming of ARMIO debug configurations is 
done via CP14 registers. Software executmg on the target, i.e., a debug monitor or 

10 operating system debug task (hereafter called a debi% task), can access debug hardware 
features merely by writing to a CP 14 register. Register breakpoints can cause 
exceptions, allowing target trap handlers to take control when a register breakpoint is 
hit The debug fuiu:tions are accessed through both software on the target and the JTAG 
port. JTAG programming is performed by feeding the processor instructions one at a 

IS time, followed optionally by a burst of data, but the interface to enter, exit and perform 
debug steps is much cleaner and more properly defined. 

3.1 Coprocessor 14 

20 3.UCP14 Interface 

A coprocessor interface will act as the communication mechanism between the ARM 
core and the debugger, and both CP14 and CP IS accesses will be handled by this block. 
The coprocessor mterface is similar to the VFP coprocessor inter&ce, but has fewer 
25 control signals and narrower busses. The interface should contain the following signals: 



inputs: description: 

30 ASTOPCPE indicates an ARM stall in execute 

ASTOPCPD indicates an ARM stall in decode 
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ACANCELCP 
AFLUSHCP 
LDCMCRDATA 
CPINSTR[31:0] 

5 CPSUPER 
CPLSLEN 
CPLSSWAP 
CPINSTRV 
CPRST 

10 CPCLK 



cancel instruction in CP 

flush coprocessor pipeline 

bus containing iaprxi data fix>m ARM core 

instruction to the coprocessor from ARM 

coprocessor supervisor mode 

length indicator to the LSU 

swap indicator to the LSU 

indicates a valid instruction to the CP 

coprocessor reset 

coprocessor clock 



outputs: 



CPBUSYD 
15 CPBUSYE 

CPiSSERIALIZE 

CPBOUNCEE 

STCMRCDATA 



busy wait in decode 
busy wait in execute 

force the core to hold an instruction in decode 

CP rejects instruction 

bus containing output data to ARM core 



20 3.1.2 CPU Register Map 



All debug state is mapped into CPU as registers. Three CPU registers (R0,RUR5) 
can be accessed in privileged mode by a debug task, and four registers (R0,R1,R4,R5) 
are accessible as scan chains. The remaining registers are only accessible in privileged 
25 mode from a debug task. Space is reserved for up to 16 breakpoints and 16 
watchpoints. A particular implementation may implement any number from 2 to 16. 
For ARMIO, there will be six (6) instruction-side breakpoints and two (2) data-side 
watchpoints. 

30 The breakpoint registers will come up out of reset in privileged mode and disabled. Note 
that the Debug ID Register (RO) is read-only. Also, be aware that the matching bits in 
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tlie watchpoint and brealq)oint control registers contain an encoding which allows 
programmers to specify a field (e.g., a size, supervisor/user) by setdng the appropriate 
bit. The user should set bits to all ones if the field is not being considered for matching. 

5 There are two ways to disable debugging. A Global Enable bit in the DSCR is used to 
enable or disable all debug fimctionality via software. Upon reset, the bit is cleared, 
which means all debug functionality is disabled. All external debug requests are ignored 
by the core, and BKPT instructions are treated as no-ops. The intent of this mode is to 
allow an operating system to quickly enable and disable debugging on individual tasks 
10 as part of the task switching sequence. In addition, the DBGEN pin allows the debug 
features of the ARMIO to be disabled. This signal should be tied LOW only when 
debugging v^U not be required. 

Because of the large number of registers to support, the CRm and opcode2 fields are 
15 used to encode the debug register number, where the register number is 
{opcode2,CRm}. 

RO: Debug ID Register 

20 DIDR[31:24] Designer code, as for CP 15 RO 

DIDR[23:20] Zero 

DIDR[19: 16] Debug Architecture Version, ARMIO = ObOOOO 

DIDR[1 5: 12] Number of implemented register breakpoints 

DIDR[ 11:8] Number of implemented watchpoints 

25 DIDR[7:4] Zero 

DIDR[3 :0] Revision number 



30 
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Rl: I>ebug Status and Control Register (DSCR) 



10 
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DSCR[5] 
DSCR[6] 



DSCR[1:0] Reserved 

DSCR[4:2] Method of debug entry - READ ONLY 

000 JTAG HALT instruction occurred 

001 Renter tveakpoint occurred 

010 Watchpoint occurred 

0 1 1 BKPT instruction occurred 

1 00 External debug request occurred 

101 Vector catch occurred 
110,111 Reserved 
Reserved 

wDTR buffer empty - READ ONLY 
This bit is set if the wDTR buffer is ready to have data written to it, normally 
resulting from a read of the data in the buffer by the JTAG debugger. 
A zero indicates that the data has not yet been read by the debugger. 
DSCR[7] rDTR buffer full -READ ONLY 

This bit is set if there is data in the rDTR for the core to read, normally 
resulting from the JTAG debugger writing data this hxxSet. A zero 
indicates that Aere is no data in the buffer to read. 
DSCR[15:8] Reserved 

Vector Trap Enable - Reset 

Reset when nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trap Enable - Undefined instruction 
Reset when nTRST = 0 or if the TAP controUer is in the Reset state. 
Vector Trap Enable - SWI 

Reset when nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trj^ Enable - Prefetch Abort 

Reset when nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trap Enable - Data Abort 

Reset when nTRST = 0 or if the TAP controller is in the Reset state. 



20 

DSCR[16] 
DSCR[17] 

25 

DSCR[18] 
DSCR[19] 
30 DSCR[20] 
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DSCR[21] Reserved 

DSCR[22] Vector Trap Enable - IRQ 

Reset when nTRST « 0 or if the TAP controller is in the Reset state, 
DSCR[23] Vector Trap Enable - HQ 
5 Reset when nTRST == 0 or if the TAP controller is in the Reset state. 

DSCR[30:24] Reserved 

DSCR[3 1 ] Global Debug Enable - cleared on a system reset 

0 = All debugging functions disabled (breakpoints, watchpoints, etc.) 

1 = All debi^ging functions enabled. 

10 

Note: Vector catch has a higher priority than breakpoints 
R2-R4: Reserved 
IS RS: Data Transfer Register 
DTR[31:0] Data 

(Note: The DTR physically consists of two separate registers: the rOTR (read) and the 
20 wDTR (write). See 3.2.7 and 3 .2.2. 1 0 for the description of their use.) 

R6-R63: Reserved 

R64-R79: Register Brealqpoint Values 

25 

BV[3 1 :0] Register breakpoint value 



30 
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R80-R9S: Register Brealqpoint Control Registers 

BCR[0] Enable - clear on a system reset 

0 = register disabled 
S 1 register enabled 

BCR[2: 1] Siq)ervisor (Trans) Access 

00 - Reserved 
10- Privileged 

01 - User 
10 U- Either 

BCR[4:3] Thumb mode 
.00 -Reserved 
10 - ARM instruction 

01 - Thumb instruction 
15 11 -Either 

R96-R1 1 1 : Watchpoint Values 

WV[3 1 :0] Watchpoint value 

20 

R112-R127: Watchpoint Control Registers 

WCR[0] Enable 

0 = register disabled (Clear on a system reset) 
25 1 = register enabled 

WCR[2: 1] Supervisor (Trans) 
00 - Reserved 
10 - Privileged 
01 -User 
30 11 -Either 

WCR[4:3] Load/Store/Either 
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00 - Reserved 

10 - Load 

01 - Store 

11 - Either 

5 WCR[7:5] Byte/Halfword/Word/Any Size 

000 - Reserved 

001 - Byte 

010 - Halfword 

011 -ByteorHalfword 
10 100 -Word 

101 - Word or Byte 
110 -Word or Halfword 
111 - Any Size 
WCRC8] Reserved 
15 WCR[10:9] Address Mask oaAddr[ 1:0] 

0 " include bits in comparison 

1 - excliide bits in comparison 
WCR[11] Reserved 

20 3.1.3 CP14 Instructions 

The following are the only legal instructions for CP14 in ARMIO. LDC and STC 
instructions are valid only for flie DTR register. All other instructions will bounce. 



25 Register Name Instruction 

RO ID MRC pl4ARd,cO,cO,0 

Rl DSCRMRC pl4,0Mc0,cl,0 

MCR pl4,0Mc0,cl,0 

R5 DTR MRC pl4,0,Rd,c0,c5,0 

30 MCR pl4,0,Rd.c0,c5,O 

LDC pl4,c5,<addressingmode> 



STC pl4,c5,<addressingmode> 



Register Neane Instruction 
R64 BV MRC pl4,0Jld,c0,c0,4 
S MCR pl4,0^cO,cO,4 

R65 BV MRC pl4,0^,c0,cl,4 
MCR pl4.0McO,cl,4 
R66 BV MRC pl4,0,Rd,cO,c2,4 
MCR pl4,0,Rd,cO,c2,4 
10 R67 BV MRC pl4,0,Rtl,cO,c3,4 
MCR pl4,0Jtd,c0,c3,4 
R68 BV MRC pl4,0JRd,c0,c4,4 
MCR pl4,OJld,cO,c4,4 
R69 BV MRC pl4,0^c0,c5.4 
15 MCR pl4,OJl(i,cO,c5,4 

R80 BCR MRC pl4,0^cO,cO,S 
MCR pl4,0,Rd,cO,cO,5 
R81 BCR MRC pl4,0Jld,c0,cl,5 
MCR pl4,0,Rd.cO,cl,5 
20 R82 BCR MRC pl4,0.Rci,c0,c2,5 
MCR pl4,0,Rd,cO,c2,S 
R83 BCR MRC pl4,OJld.cO,c3,5 
MCR pl4,OJld,cO,c3,5 
R84 BCR MRC pl4,0,Rd,cO,c4,5 
25 MCR pl4,0^d,c0,c4,5 

R85 BCR MRC pl4,0,Rd,cO,c5,5 
MCR pl4,0,Rd,cO,c5,5 
R96 WV MRC pl4,OJld,cO,cO,6 
MCR pl4,0^d.cO,cO,6 
30 R97 WV MRC pl4,OMcO,cl,6 
MCR pl4,0Mc0,cl,6 
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R112 WCR MRC pl4,0,Rd,cO,cO,7 

MCR pl4,0,Rci,cO,cO,7 
R113 WCR MRC pl4,0,Rcl,cO,clJ 

MCR pl4.0,Rd,cO,cl,7 

5 

3.2 The Hardware Interface to Debug 

3.2.1 Entering and Exiting Halt Mode 

1 0 Halt mode is enabled by writing a 1 to bit 30 of the DSCR, which can only be done by 
the JTAG debugger. When this mode is enabled, the processor will halt (as opposed 
to taking an exception m software) if one of the following events occurs: 

a) EDBGRQ is asserted 
15 b) A HALT ixistruction has been scanned in through the JTAG interfece. The TAP 
controller must pass through Run*Test/Idle in order to issue die HALT command 
to the core. 

c) An exception occurs and the corresponding vector trap enable bit is set 

d) A register breakpoint hits 
20 e) A watchpoint hits 

f) A BKPT instruction reaches the execute stage of the ARM pipeline 

The Core Halted bit in the DSCR is set when debug state is entered. Presumably, the 
debugger will poll the DSCR by going through Capture-DR and Shift-DR until it sees 
25 this bit go high. At this point, the debugger determines why the core was halted and 
preserves the machine state. The MSR instruction can be used to change modes and 
gain access to all banked registers in the machine. While in debug state, the PC is not 
incremented, external interrupts are ignored, and all instructions are read &om the 
Instruction Transfer Register (scan chain 4). 

30 
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Exiting from debug state is performed by scannii^ in the RESTART instmction through 
the JTAG interface. The debugger will adjust the PC before restarting, depending on the 
maimer in vduch the core entered debug state. The table below indicates the value of the 
PC at the time ttie core halted for each case. 

5 





ARM 


Thumb 


EDBGRQ asserted 


PC+8 


PC+4 


HALT instruction 


PC+8 


PC+4 


Vector Trap 


PC+8 


PC+4 


Register Brea]q)oint 


PC+8 


PC+4 


Instruction 


PC+8 


PC+4 


Breakpoint 






Watchpoint 


PC+8 


PC+4 



When the state machine enters the Run-Test/Idle state, normal operations will resume. 
The delay, waiting until the state machine is in Run-Test/Idle, allows conditions to be 
set up in other devices in a multiprocessor system without taking immediate effect. 
1 0 When Run-Test/Idle state is entered, all the processors resume operation simultaneously. 
The Core Restarted bit will be set when the RESTART sequence is complete. 

Note that before the core issues a RESTART command, it should poll the "Instruction 
Complete" bit (wDTR[0]) to ensure that the last instruction completes without any 

15 problems (potential aborts and whatnot). After issuing a RESTART instruction to the 
core, the debugger must poll to see that the core has indeed restarted before doing 
anythii^ else. There are synchronizers and handshake lines in the debug logic which 
must have a clock (specifically, TCK) to allow the clearing of those handshake lines. 
If the clock is turned off before the debug logic has a chance to clear down the 

20 DbgRestart line, then the core will renudn in debug state and not start up again. The 
very act of reading the Core Restarted bit gives enough clocks to clear down the 
necessary lines. 
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3.2.2 The JTAG Port and Test Data Registers 

5 The JTAG portion of the logic will implement the IEEE 1 149. 1 interface and support 
a Device ID Register, a Bypass Register, and a 4-bit Instruction Register. In addition, 
the following public instructions will be supported: 





Instruction 


Binary Code 


10 


EXTEST 


0000 




SCAN_N 


0010 




SAMPLE/PRELOAD 


0011 




CLAMP 


0101 




HIGHZ 


0111 


15 


CLAMPZ 


1001 




IDCODE 


1110 




BYPASS 


1111 




INTEST 


1100 




RESTART 


0100 


20 


HALT 


1000 



Access to the debug registers can be obtained through either software (with MCR 
instructions) or through the JTAG port. Fundamentally the hardware debug 
mechanism is similar to ARM7/ARM9, but ARM 10 debug hides all clocking and 
25 pipeline depth issues fiom the debugger 

Registers in CP14 which are accessible via JTAG (R1J15) are written using an 
EXTEST instruction. The registers RO, Rl, and R5 are read with either the INTEST or 
EXTEST instruction. This differs from ARM9 in that only the INTEST instruction 
30 was used and a r/w bit in the chain determined the operation to be performed. 
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3.2.2.1 Bypass Register 

Purpose: Bypasses the device during scan testing by providing a path between TDI 

andTDO. 

Length: Ibit 

S Operating mode: When the bypass instruction is the current instruction in the 
instruction register, serial data is transferred from TDI to TDO in the Shift-DR state 
with a delay of one TCK cycle. There is no parallel output from the bypass register. A 
logic 0 is loaded from the parallel input of the bypass register in Capture-DR state. 
Order: TDI-[0]-TDO 

10 

3.2.2.2 Device ID Code Register 

Purpose: In order to distinguish the ARMIO from ARM7T and ARM9T, the TAP 
controller ID will be unique so that Multi-ICE can easily see to which processor it's 
connected. This ID register will be routed to the edge of the chip so that partners can 
1 5 create their own ID numbers by tying the pins to high or low values. The generic ED for 
ARM10200 will initially be 0x01020F0F. All partnor-specific devices will be identified 
by the ID numba:s of the following form: 

Version Part Number Manufacturer ID 

20 LSB 

[31:28] [27:12] [11:1] 1 

Length: 32 bits 

Operating mode: When the IDCODE instruction is current, the ID register is selected 
25 as the serial path between TDI and TDO. There is no parallel output from the ID 
register. The 32-bit ID code is loaded into the register from its parallel inputs during 
the Capture-DR state. 
Order: TDI-[311 [301.-[1] [OJ-TDO 

30 3.2.2.3 Instruction Register 

Purpose: Changes the current TAP instruction 
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Length: 4 bits 

Operating mode: When in Shift-DR state, the instruction register is selected as the 
serial path between TDI and TDO. During the Capture-DR state, the value 0001 
binary is loaded into this register. This is shifted out during Shift-IR (least significant 
5 bit first), while a new instruction is shifted m (least significant bit first). During the 
Update-IR state, the value in the instruction register becomes the current instruction. 
On reset, the IDCODE becomes the current instruction. 
Order: TDI-[3][2][1][0]-TDO 

1 0 3.2^.4 Scan Chain Select Register 

Purpose: Changes the current active scan chain 
Length: S bits 

Operating mode: After SCAN_N has been selected as the current instruction, when in 
Shift-DR state, the Scan Chain Select register is selected as the serial path between 

15 TDI and TDO. During the Capture-DR state, the value lOOOO binary is loaded into 
tiiis register. This is shifted out during Shift-DR Geast significant bit first), while a 
new value is shifted in (least significant bit first). During the Update-DR state, the 
value in the register selects a scan chain to become the currentiy active scan cham. All 
fiirther instructions such as INTEST then apply to that scan chain. The currently 

20 selected scan chain only changes when a SCANJM instruction is executed, or a reset 
occurs. On reset, scan chain 3 is selected as the active scan chain. The number of the 
currently selected scan chain is reflected on tiie SCREG[4:0] output bus. The TAP 
controller may be used to drive external scan chains in addition to those within the 
ARM 1020 macrocell. The external scan chain must be assigned a number and control 

25 signals for it, and can be derived from SCREG[4:0], IR(3:0], TAPSM[3:0], and TCK. 
Order: TDI-[41[3][2][1][0]-TDO 

3.2.2.5 Scan Chain 0 
Purpose: Debug 
30 Lengtii: 32 bits 

This scan chain is CPU Register 0, the Debug ID Register. 
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Order: TDI-[31][30]...[1][0]-TDO 

3.2^.6 Scan Chain 1 
Purpose: Debug 
5 Length: 32 bits 

This scan chain is CP14 Register 1, the DSCR. Note that bits DSCR[15:0] are read 
only. The following bits are defined for Chain 1 : 



DSCR[0] 


Core Halted - READ ONLY 


DSCR[1] 


Core Restarted - READ ONLY 


DSCR[4:2] 


Method of debug entry - READONLY 


000 


JTAG HALT instruction occurred 


001 


Register breakpoint occurred 


010 


Watcl^int occurred 


Oil 


BKPT instruction occurred 


100 


External debug request occurred 


101 


Vector catdi occurred 


110,111 Reserved 



DSCR[5] Abort occurred sometime in the past -WRITABLE ONLY WITH AN 
20 MCR 

This bit is sticky; it's cleared with an MCR to the DSCR where this bit 

is a zero. 

Reset when nTRST = 0 or if die TAP controUw is in the Reset state. 

25 DSCR[61 wDTR buffer empty -READ ONLY 

This bit is the core's indicator that the wDTR buffer is empty, meaning that 
the core can write more data into it This is the inversion of the bit that 
the JTAG debugger would see if were to poll the DTR by going 
through CaptureDR with EXTEST. The debugger should not use this 

30 bit to drtermineifthewDTR is empty or fiill as the timing between the 

JTAG signal and die core signal are different 



30 



10 



DSCR[7] rDTR buffer fuU - READ ONLY 

This bit is the core's indicator that the rDTR buffer is full, meaning that the 
debugger has written data into it. This is the inversion of the bit that the 
JTAG debugger would see if were to poll the DTR by going dirough 
CaptureDR with INTEST. The debugger should not use this bit to 
determine if the rDTR is empty or full as the timing between the JTAG 
signal and the core signal are different. 
DSCR[15:8] Reserved 

Vector Trap Enable - Reset - READ ONLY 
Reset when nTRST = 0 or if die TAP controller is in the Reset state. 
Vector Trap Enable - Undefmed Instruction - READ ONLY 
Reset when nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trap Enable - SWI - READ ONLY 
Reset when nTRST - 0 or if the TAP controller is ui the Reset state. 
Vector Trap Enable - Prefetch Abort - READ ONLY 
Reset when nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trap Enable - Data Abort - READ ONLY 
Reset when nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trap Enable - Reserved 
Vector Trap Enable - IRQ - READ ONLY 
Reset vdien nTRST = 0 or if the TAP controller is in the Reset state. 
Vector Trap Enable - FIQ - READ ONLY 
Reset v(*en nTRST » 0 or if Ae TAP controller is in die Reset state. 
DSCR[26:24] Reserved 
25 DSCR[27] Comms Channel Mode 

0 = No comms channel acdvity 

1 - Comms channel activity 

Reset when nTRST = 0 or if die TAP controller is in die Reset state. 



DSCR[16] 

DSCR(17] 

DSCR[18] 

15 DSCR(19] 

DSCR[20] 

DSCR[21] 
20 DSCR[22] 

DSCR[23] 



30 DSCR[28] Thumb mode indicator (see Section 5.0) 
DSCR[29] Execute histrucdon in ITR select 
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0 = Disabled 

1 - Instruction in ITR is sent to prefetch unit if JTAG state machine 
passes through Run-Test/Idle 

Set when nTRST = 0 or if the TAP controller is in the Reset state. 
S DSCR[30] Halt/Monitor mode select 

0 = Monitor mode enabled 

1 = Halt mode enabled. 

Reset when nTRST = 0 or if the TAP controller is m the Reset state. 
DSCR[3 1 ] Global Debug Enable - cleared on a system reset 
10 0 = All debugging functions disabled (breakpoints, watchpoints, etc.) 

I = All debugging functions enabled. 
Reset when nRESET = 0 (the core's reset line) 

Note that the comms channel bits, rDTR Full and wDTR Empty are inversions of 
1 5 what the debugger sees, as these bits are mirrored in the DSCR for the core's use, not 
the debugger's. 

Order: TDI.[311[30]...[1][0]-TDO 

20 3.2.2.7 Scan Chain 2 
Purpose: Debug 
Length: 65 bits 

This scan chain is the combination of CP 14 Registers 4 and 5. Note that the 
Instruction Complete bit in Register 4 is not included in this chain. It only appears in 
25 chain 4. 

Order: TDI-Reg4[32]Reg4[31]...Reg4[l]Reg5[32]Reg5[31]...Reg5[0]-TDO 
3.2.2.8 Scan Chain 3 

Purpose: Can be used for external boundary scan testing. Used for inter-device testing 
30 (EXTEST) and testing the core (INTEST). 
Length: undetermined 
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3.2.2.9 Scan Chain 4 
Purpose: Debug 
Length: 33 bits 

S This scan chain is the Instruction Transfer Register, used to send instructions to the core 
via the prefetch unit. This chain consists of 32 bits of information, plus an additional bit 
to indicate the completion of the instruction sent to the core. 
Order: TDI-[32][31][30]...[1][0]-TDO 

10 3.2.2.10 Scan Chain 5 
Purpose: Debug 
Length: 33 bits 

This scan chain is CP 14 Register 5, the Data Transfer Register. This register 
physically consists of two separate registers: the read-only DTR (rDTR) and the write- 

15 only DTR (wDTR). This register has been separated to facilitate the creation of a 
bidurectional comms channel in software. The rDTR can only be loaded via the JTAG 
port and read only by the core via an MRC instruction. The wDTR can only be loaded 
by the core through an MCR instruction and read only via the JTAG port. From the 
TAP controller's perspective, it only sees one register (Chain 5), but the appropriate 

20 register is chosen depending on which instruction is used (INTEST or EXTEST). 

The wDTR chain itself contains 32 bits of information plus one additional bit for the 
conuns channel. The definition of bit 0 depends on whether the current JTAG 
instruction is INTEST or EXTEST. If the current instruction is EXTEST, the 

25 debugger can write to the rDTR, and bit 0 will indicate if there is still valid data in the 
queue. If the bit is clear, the debugger can write new data. When the core performs a 
read of the DTR, bit 0 is automatically cleared. Conversely, if the JTAG instruction is 
INTEST, bit 0 indicates if there is currently valid data to read in the wDTEL If the bit 
is set, JTAG should read the contents of the wDTR, which in turn, clears the bit. The 

30 core can then sample bit 0 and write new data once the bit is clear again. 
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The rDTR chain contains 32 bits of information plus one additional bits for the 
comms channel. 

Order TDI-rDTR[32]rDTR[31]...rDTR[l]rDTR[0] 
5 wDTR[32]wDTR[31]..,wDTR[l]wDTR[0]-TDO 

3.2.2.1 1 Scan Chains 6-15 
Reserved 

10 3.2.2.12 Scan Chains 16-31 
Unassigned 

3.2.3 Sending Instructions to the Core 

15 

Two registers in CP14 are used to communicate with the ARMIO processor, the 
Instruction Transfer Register (TTR) and the Data Transfer Register (DTR). The ITR is 
used to jam an instruction into the processor's pipeline. While in debug state, most of 
the processor's time is spent effectively executing invalid instructions until the ITR is 
20 ready. In hardware debug state, the PC is not incremented as instructions are executed; 
however, branches will still modify the PC. 

DSCR[29] controls an autoexecute function. When this bit is set, each time the JTAG 
TAP controller enters the Run-Test/Idle state, the instruction currently residing in the 
25 ITR is sent to the prefetch unit for execution by the core. If this bit is clear, no 
instruction will be passed to the prefetch unit. The instraction in the JTAG IR register 
must be either INTEST or EXTEST. 

The autoexecute feature allows for &sX iqploads and downloads of data. For example, a 
30 download sequence might consist of the following. Initially, scan chain 2, the 
combination of scan chains 4 and 5, is selected in the ScanNReg, then the JTAG 



34 



instruction is set to EXTEST for writing. A core write instruction (an STC) and the 
associated data are serially scanned into the ITR and DTR, respectively. When the TAP 
controller passes through the Run-Test/Idle state, the instruction in the ITR is issued to 
the core. Next, the scan chain can be switehed to the DTR only (chain 5) and polled by 
5 going through the Capture-DR state, then the Shift-DR state. The least significant bit in 
the chain, which is bit wDTR[0], is examined until this, status bit indicates the 
completion of the instruction. More data can then be loaded into DTR and the 
instruction re-executed by passing through Run-Test/Idle. Here, we also assume that the 
STC instruction specifies base address writeback so that the addresses are automatically 
10 updated. 

To increase the performance of upload, a similar mechanism can be used. First, the 
JTAG instruction is changed to EXTEST. Using chain 2, a read instruction such as 
LDC can be scanned into the ITR. Then, the JTAG instruction is switched to INTEST 
IS for reading. The scan chain can then be switched to the DTR and polled until the 
instruction completes. By passing through the Rtm-Test/Idle state on the way to Shift- 
- DR (for polling), the instruction in the ITR is issued to the core. This process is then 
repeated until the last word is read. 

20 Having the instruction executed by going through Run-Test/Idle addresses the problem 
of running the core clock at a frequency close to that of the JTAG clock. If the 
instruction has been issued to the core and the data is not yet available for capture, the 
emulator can simply go around the state machine loop ^ain and poll until the data is 
available. Once there, it can swing around the state machine loop once more and capture 

25 the data, then scan it out. Placing the autoexecute mechanism on any other state in the 
inner loop forces another instruction to be dispatched too early, possibly overwriting 
other data. Run-Test/Idle sits outside the inner loop and is only one state transition away, 
incurring little penalty for having to go through it. For systems where the processor 
clock is significantly fester than the JTAG clock, the data will normally be available 

30 well before the TAP controller gets to Capture-DR from Run-Test/Idle, so it will pass 
through the inner loop one time, cq)turing the data then scanning it out 
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Note that because CP14 does not monitor the busses in the same mamier that ARM9 
did, reading die contents of the core's register file requires individual moves firom an 
ARM register to CP14 Re^ster 5 instead of using LDM/STM instructions. The 
5 mformadon can then be scanned out of the DTR. 

Byte and haLfword transfers are performed by transferring both the address and data 
into the processor and then executing the appropriate ARM instructions. 

10 Transfers to and from coprocessors can be performed by moving data via an ARM 
register. This implies that all ARMIO coprocessors should have all data accessible via 
MRC and MCR (otherwise a data buffer in writeable memory must be used). 

3.2.4 Reading and Writing Breakpoint and Watchpoint Registers 

15 

Hardware breakpoints and watchpoints are written by transferring the data to an ARM 
register and then movii^ die data to the appropriate breakpoint or watchpoint register. 
As an example, consider loading breakpoint register R64: 

20 Scan into ITR: MRC pl4,0,Rd,cO,c3,0 

Scan mto DTR: Data to be loaded into Breakpoint Register R64 
Command is executed 

Scan into ITR: MCR pl4,0McO,cO,4 

25 In the above example, the first MRC instruction moves data from the DTR register 
(R5 in CP14) to another register in ARM. Once this data is moved, an MCR 
instruction transfers the data from the ARM register into the breakpoint register (R64 
inCP14). 
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The opposite process can be used to read a breakpoint register. The breakpoint and 
watcl^int registers are not directly accessible from a scan chain to minimize the 
implementation cost 

5 The Instruction Address that gets issued by the prefetch unit always has bit 0 set to zero. 
In Thumb mode, bit 1 represents the odd wordiness of an address, while in ARM mode, 
this bit is also set to zero. Although breakpoint registers contain a full 32-bit field for 
comparison, for breakpoints and watchpoints in Thumb mode, the user should take care 
not to set a value in the register which would never match, Le., bit 0 is a one. 

10 

The bits in a breakpoint and watchpoint control register should be self-explanatory. 

3.2.5 Software Lockout Function 

IS When the JTAG debugger is attached to an evaluation board or test system, it will 
indicate its presence by settmg the Halt/Monitor Mode bit in the DSCR. At this point, 
breakpoint and watchpoint registers can be written and read by the debugger while in 
Halt Mode. Once breakpoint and watchpoint registers have been configured, software 
cannot alter them firom the processor side if the Halt/Monitor Mode bit remains high as 

20 the debugger retains control. The core can still write to the comms channel register, 
however. 

3.2.6 External Signals 

25 There is one external signal associated with debug: EDBGRQ, with which the system 
requests the ARM1020 to enter debug state. External logic may wish to use this line to 
halt the ARM1020 in a multiprocessor system or at startup to inunediately force the 
processor into debug state. 



30 3.2.7 Saving and Restoring Processor State 
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Before debugging, die emulator must save control settings, register values, or other state 
that might get altered during the course of emulation. To this end, care needs to be taken 
to restore all conditions back to their original state before leaving debug. The PC value 
that is read out after debug entry will be PC+0x8 for all cases, i.e., vector catehes, BKPT 
5 instructions, register breakpoints, HALT instructions from JTAG, etc. 

Because the DTR has been split into two registers, its necessary to save the rDTR and 
wDTR state iofoimation. A save and restore sequence might look like: 

10 (hardware executes a HALT instruction through JTAG) 
Poll until Core Halted is asserted 
Once asserted, capture the wDTR and scan it out 

Change the JTAG IR to EXTEST and scan junk into the rDTR - this forces the rDTR 
status bit out 
1 5 (save other registers) 



(finished debugging) 
20 Scan the old CPSR into the rDTR 

Load the ITR with an MRC which transfers the rDTR into RO and execute 
Load the ITR with an MSR which transfers RO to CPSR 
Scan the old PC into rDTR 

Load the ITR with an MRC which transfers the rDTR into RO and execute 
25 Load the ITR with a MOV which transfers RO into the PC 
Scan the old RO into the rDTR 

Load the ITR with an MRC which transfers the rDTR into RO 
(restore registers) 

If the saved rDTR status bit indicated it was full, scan old rDTR information into the 
30 rDTR 

Issue RESTART command 
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Poll until Core Restarted is asserted. 
3.3 The Software Inter&ce to Debug 

5 

Monitor mode describes those ARM operations that are used to configure register 
breakpoints, respond to those breakpoints, and even halt the core. Monitor mode is 
also useful in real-time systems when the core cannot be halted in order to collect 
information. Examples are engine controllers and servo mechanisms in hard drive 
10 controllers that cannot stop the code without physically damaging the components. 
For situations that can tolerate a small intrusion into the instruction stream, monitor 
mode is ideal. Using this technique, code can be suspended with an interrupt long 
enough to save off state informatioa and important variables. The code continues once 
the exception handier is finished. 

15 

3.3.1 Entering and Exiting Monitor Mode 

Monitor mode is enabled by writing a 0 to bit 30 of the DSCR. When monitor mode is 
enabled, the processor takes an exception (rather than halting) if one of the following 
20 events occurs: 

1) A register breakpoint is hit 

2) A watchpoint is hit 

3) A breakpoint instruction reaches the execute stage of the ARM pipeline 
25 4) An exception is takm and the corresponding vector trap bit is set 

Note that the Global Debug Enable bit in the DSCR must be set or no action is taken. 
Exiting the exception handler should be done in the normal fashion, e.g., restoring the 
PC to (R14-0x4) for prefetch exceptions, moving R14 into the PC for BKPT 
30 mstructions because they're skipped, etc. The table below indicates the value of the PC 
at the time the core takes the exception. 
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ARM 



Thumb 



Vector Trap 
Register Breakpoint 



PC+8 



PC44 



PC+8 



PC+4 



Instruction 



PC+8 



PC+4 



Breakpoint 
Watchpoint 



PC+8 



PC+8 



Data Abort 



PC+8 



PC+8 



3.3.2 Reading and Writing Brealq)oint and Watchpoint Registers 

5 

When in monitor mode, all breakpoint and watchpoint registers can be read and written 
with MRC and MCR instructions from a privileged processing mode. 

For a description of the register field encodings, see Sections and 3.12 and 3.2.4. 

10 

3.3.3 The BKPT Instruction 

The ARM debug architecture defines breakpoint instructions for both ARM and Thumb. 
Execution of one of these instructions has the same effect as hitting a register 
1 5 breakpoint In monitor mode, a prefetch abort is taken; in halt mode, the core halts. Each 
debug instruction has an unused field, 8 bits for Thumb and 12 bits for ARM, that can 
be used by the debugger to identify individual breakpoints. 

The ARM opcode is 32'hE12xxx7x; the Thumb opcode is 161ibexx, where x 
20 designates an unused field. 

3.3.4 The Comms Channel 
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The comms channel in ARMIO has been implemented by using the two physically 
separate Data Transfer Registers (DTRs) and a bit to augment each register, creating a 
bidirectional serial port. The extra bit indicates that valid data resides in the data register. 
By convention, the target software owns the write DTR (wDTR) and the host software 
5 owns the read DTR (rDTR). In other words, the wDTR is written by the core and the 
information is then scanned out through the JTAG port by the host Since the wDTR is 
the only register with a TDO connection, bit 0 of scan chain 5 is chosen by the current 
instruction (either INTEST or EXTEST) in the JTAG Instruction Register. When doing 
debug conmis channel activities, bit 27 of the DSCR is set to indicate to the debug logic 
10 that the least significant bit of the wDTR now indicates the state of the comms channel 
registers rather than die completion of instructions. 

When the debugger is reading the data meant for it, INTEST is loaded into the IR and 
the contents of the wDTR are scanned out If the least significant bit of the 33-bit packet 
1 S of data is set, the data is valid. Bit 0 in the wDTR is then cleared by this read. If the bit is 
cleared, meaning that the core has not written any new data, the debugger may wish to 
poll the DSCR to see if the core has halted. 

Similarly, EXTEST is used to write data into the rDTR by the debugger, and this 
20 operation sets bit 0 for this register, indicating valid data. What the debugger sees is 
actually the inversion of this bit, so when the debugger goes to write more data, bit 0 
should be checked to see that it set, meaning the core has read the rDTTL If the bit was a 
zero, indicating that the rDTR is still fiill and the core has not read old data, then the 
new data shifted in is not loaded into the rDTR. If after doing this many times, the 
25 debugger wants to check if the core has halted, the status bit remains valid, as well as the 
data in the rDTR. This must be scanned out and replied once the debugger saves off 
state information, only to be scanned back in at a later time. This act of moving the data 
fit>m the rDTR into a core register then into the wDTR to be scanned out will clear 
down the state machines which control the rDTRfiill bit. 

30 



41 



These extra bits are actually reflected in the DSCR, so that the core can use MRCs to 
read them. Note, however, thai the bits are inversions of those seen by the debugger, 
since they are for the core's use. 

S Because halt mode and monitor mode are mutually exclusive, the transfer registers are 
not used for any other purpose in monitor mode. 

4.0 Debug and Exceptions, Vector Catching 

10 4. 1 Instruction Breakpoints 

Instruction breakpoints will be clocked into the core at the same time as instruction 
data. The breakpoint will be taken as soon as the instruction enters the execute stage 
of the pipeline, assuming an abort is not pending. The breakpoint is taken whether or 
1 5 not the instruction would have failed its condition code. 

A breaIq)ointed instniction may have a prefetch abort associated with it. If so, the 
prefetch abort takes priority and the breakpoint is ignored. SWT and undefined 
instructions are treated in die same way as any other instruction which may have a 
20 breakpoint set on it. Therefore, the brealq)oint takes priority over the SWI or undefined 
instruction. 

On an instruction boundary, if there is a breakpointed instruction and an interrupt (FIQ 
or IRQ), the interrupt is taken and the breakpointed instruction is discarded. Once the 
25 interrupt has been serviced, the execution flow is returned to the original program. 
This means that the instruction which was previously breakpointed is fetched again, 
and if the breakpoint is still set, the processor enters debug state once it reaches the 
execute stage of the pipeline. 



30 4.2 Watchpoints 
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Entry into debug state following a watchpointed memory access is imprecise relative to 
the instruction stream accesses. This is due to the nature of the pipeline and the timing of 
the watchpoint signals going to the core. The processor will stop on the next instraction 
executed after the watchpoint triggers, which may be several ixistructions after the 
5 watchpointed instruction began execution. 

If there is an abort with the data access as well as a watchpoint, the abort exception 
entry sequence is performed, and then the processor enters debug state. If there is an 
interrupt pending, again the processor allows the exception entry sequence to occur 
10 and then enters debug state. If the following instruction aborts, then the abort will be 
not be taken. 

The Fault Status Register (FSR) in CP 15 differentiates between the MMU and the 
debug system aborting an access. Bit 9 of the FSR[3:0] field is forced to a zero if a 
15 data abort occurs. If there is no data abort and a watchpoint occurs, DFSR[9] is forced 
to a one. When software reads this register and sees DFSR[9] set to a one, the 
remaining bits should be ignored. 

4.3 Interrupts 

20 

Once the processor has entered debug state, it is important diat fiirther interrupts do not 
affect the instructions executed. For this reason, as soon as the processor enters debug 
state, inteiTupts are disabled, although the state of the I and F bits in the PSR are not 
affected. 

25 

4.4 Exceptions 

The order of exception priorities in the ARMIO core is as follows: 



30 
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Highest 


Reset 




Vector Trap* 




Data Abort 




Watchpoint 




CP Bounce 




FIQ 




IRQ 




JT AG HALT 




External Debug Request 




Prefetch Abort 




Register Breakpoint hit 




Instruction Breakpoint hit 


Lowest 


SWI/Undefined 


*Vector traps can only be taken at the end of another exception. Once an 
exception entry sequence has completed, the trap for that sequence will be 
taken in preference over all following exceptions, except for Reset. 



The table below summarizes the behavior of the debug logic in both halt and monitor 
S modes. 





Halt Mode 


Monitor Mode 


MOE[2:0] 


Event 


DbgGlobalEa 


DbgGlobalEn 






X 


0 


1 




Data Abort 


Data abort 


Data abort 


Data abort 


000 


Register 


Halt 


X 


Prefetch abort 


001 


breakpoint 










Watchpoint 


Halt 


X 


Data abort 


010 


BKPT 


Halt 


X 


Prefetch abort 


Oil 
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JTAGHALT 


Halt 


X 


X 


111 


EDBGRQ 


Halt 


X 


X 


100 


Prefetch 


Prefetch abort 


Prefetch 


Prefetch abort 


000 


Abort 




abort 






Vector trap 


Halt 


X 


X 


101 



In monitor mode, if a register breakpoint is hit or the BKPT instruction is executed, the 
prefetch abort exception is taken. If a watchpoint hits, the data abort exception is taken. 
The Fault Status Register (FSR) in CP15 is used to differentiate between the MMU and 
S the debug system aborting die access. An encoding has been added to the FSR to 
indicate that a watchpoint hit Ri4_abort points to the first instruction after the one 
which has not been executed. The two pathological cases have been disabled: setting a 
vector trap on a prefetch abort or a data abort is not allowed. Under no circumstances 
should a JTAG HALT instruction be scanned into the part while in software mode, as 
10 the handshake line that the debug logic uses to clear down the DbgHalt line - the 
DbglnDebug line - never appears and would cause the processor to continually take 
prefetch aborts (under the current Rev 0 hardware implementation). 

For the case of load or store multiple instructions which have watchpoints set, other 
15 instractions could have possibly (and probably) run underneath it Since the debugger 
will have to know the return PC value as well as the PC value of the load/store multiple 
instruction, the data address of the watchpoint will be stored in the D-side Fault Address 
Register (FAR), the PC of the watchpoint instmction itself (plus 0x8) will be stored in I- 
side FAR The restart PC will be held in the R14 as usual. 

20 

If the undefined instruction exception is taken while the core is in debug state, i.e., if a 
debugger scans an undefined instruction into the core while in debug state, the core will 
change modes and change the PC to the undefmed instruction trap vector. The debugger 
can use this information to determine when an undefined instruction has been seen by 
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the core. An example might be a coprocessor instruction which bounces because it's not 
si^ported for a given implementation. 

4.S Vector Catching 

5 

The ARMIO debug unit contains logic that allows efficient trapping of fetches from 
the vectors during exceptions. This is controlled by the Vector Catch Enables located 
in the DSCR. If one of the bits in this register field is set HIGH and the corresponding 
exception occurs, the processor behaves as if a register breakpoint has been set on an 
10 instruction fetch from the relevant exception vector, then halts. The vector trap 
enables are writeable only from JTAG. 

For example, if the processor is in halt mode and executes a SWI instruction while 
DSCR[18] is set, the ARMIO fetches an instruction from 0x8. The vector catch 
IS hardware detects this access and sets an internal brealqioint signal, forcing the core to 
stop. 

The vector catch logic is sensitive only to fetches from the vectors during exception 
entry. Therefore, if the code branches to an address within the vectors during normal 
20 operation, and the corresponding vector catch eimble bit is set, the processor is not 
forced into debug state. A register breakpoint can be used to catch any instruction fetch 
from a vector. 

The state priority of intenupts over breakpoints was the cause of a vector catching bug 
25 in the ARM9. For ARMIO, if an intenrupt request occurs during the issue of an 
instruction fetched from an exception vector which has been vector caught, then the core 
will handle the vector trap first Once the core is restarted, the processor will then handle 
the interrupt request. 



30 5.0 Thumb 
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La debug state, the T bit in tfie CPSR is read/writeabie by the debugger, and it does not 
get altered when entering or exiting debug state. Also, it does not affect the type of 
instruction executed while in debug state. When leaving debug state, the CPSR T bit 
determines whether ARM or Thumb instructions will be executed. 

5 

The T bit in the DSCR indicates the type of instruction (either ARM or Thumb) the 
debugger is about to execute and can be used to force the processor into ARM mode or 
Thumb mode once the core is halted, as it will be sent along with each instruction sent to 
the prefetch unit In other words, the T bit in the DSCR controls the decode of the 
10 instruction in die rTR. After debug state entry, the debugger should clear the T bit in the 
DSCR to ensure that ARM instructions can be issued to the ARMIO. An entry sequence 
for the debugger might look something like: 

(enter debug state in halt mode) 
15 Scan a 0 into the T bit of CP14 Register 1 (DSCR) 

Read the CPSR in ARM mode to get the T bit information (optional) 
Execute MCRs in ARM mode to extract ARM state information 

20 Restore all the ARM state using MRCs 

Move restart PC value to RO 
Execute an ARM MOV RO J^C 
Restore RO 

(poll to ensure that the instruction has completed) 
25 (exit debug state by issuing a RESTART command through JTAG) 

Poll for Core Restarted bit 

The core will use the T bit m the CPSR as the current mode upon exitmg debug state. 

30 6.0 Implementation Issues 
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The debug architecture is intended to be implemaited with two clock domains, a fest 
processor dock, and a slow JTAG clock. A great deal of the debug hardware runs at the 
fast clock speed, and synchronization betweai the two is performed as the instruction 
and data transfer registers are read and written fiom the JTAG scan chains. It is posable 
5 to run the JTAG clock fester than the core clock. 

Using CP 14 as the source and destination for data transfers reuses existing paths 
within the ARM, avoiding the need to add extra inputs onto the data bus. On ARMIO 
the instruction register is provided as an early instruction input in the prefetch unit 

10 

A few Multi-ICE issues came up during the course of the design which required a few 
tweaks to be added to the update mechanisms within the JTAG hardware. While 
issuing instructions to the core, the Multi-ICE software will be scanning out a bit after 
going through the CaptureDR state in order to verify whether or not the instruction 

15 previously issued has completed. If the instruction did not complete, the debug logic 
will prevent two events from happening. First, when the TAP state machine passes 
through the UpdateDR state, the value shifted into scan chain 5 will not be loaded mto 
the rDTR. Second, as the TAP state machine passes through the Run/Test-Idle state to 
issue the instruction currentiy in the ITR, this instruction wiU not be issued to the 

20 core. The thirty-third bit of the rDTR gets registered to create a "do_update" bit, 
which is used to prevent the issue of the instruction and the updating of die rDTBL 

More specifically, this logic is used for the following case. In order to attempt 
extremely fast downloads, Multi-ICE will start by scanning data into the rDTR and 

25 issuing the write to the core by going through Run-Test/Idle. For each subsequent 
write operation, bit 0 of the rDTR will be examined to see if the previous instruction 
completed. If the instruction has not finished, the debugger will presem the same data 
again. If, after a given number of attempts the instruction still has not completed, the 
debugger should check the Abort Occurred bit in die DCSR to see if an abort occurred 

30 at some point. If an abort did occur during a memory access, the "dojipdate" bit is 
cleared out, preventing any subsequent instructions from being executed and another 
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possible attempt to access memory. This is especially important if the mode suddenly 
changes because of the abort and memory which is not accessible in user mode 
becomes accessible in priveleged mode. 

5 Single stepping through code will now have to be done using the BKPT instruction, 
placed at the location after the next instruction to be executed. Register breakpoints 
can also be used. 

LDC's are word-length only. It will be necessary to force the core to do the read/write 
10 for byte/halfword accesses, then move the data into R5 to be scanned out, or vice 
versa, 

7.0 Cache (CPIS) and Memory Operations 

IS The debug unit is able to read cache information, as well as TLB information since 
CP 14 and CP 15 share some functionality. The pipeline follower and coprocessor 
interface all exist in the same block. This allows CP14 to update the FSR in CP15 
very simply. CP 15 contains sixteen registers, one of which (R15) is used as the 
interface for the debugger. By reading and writing to R15 using MRC and MCR 

20 instructions, the debug unit has full visibility of the caches. See the ARM1020 uA 
Specification for the Cache for more information. 

After the state of the machine has been changed to debug state, reads of data from 
memory do not cause either I-Cache or D-Cache to insert new entries in the cache. In 
25 other words, no accesses are treated as cacheable. This is done by forcing the caches into 
a noncacheable, nonbufFerable mode. Cache misses are disregarded by the HUM buffer 
while in debug state so that no line fills are generated. 

Any program which modifies instruction data (e.g., self-modifying code) needs to 
30 flush the I-Cache after the write by the D-Cache in order to maintain coherency. 



49 



When in debug mode, memory accesses in user mode will appear as user mode 
accesses to the memory systems and the MMUs. Memory accesses in priveleged mode 
will appear as priveleged accesses. 
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8.0 Debug Signals 



Name 


Direction 


Description 


COMMRX 


Output 


Communications Channel Receive. When 
HIGH, this signal denotes that the comms 
channel receive buffer contains data waiting 
to be read by the CPU. 


COMMTX 


Output 


Communications Channel Transmit. When 
HIGH, this signals denotes that the comms 
channel transmit buffer is empty. 


DBGACK 


Output 


Debug Acknowledge. When HIGH, this 
signal indicates the ARM 1020 is in debug 
state. 


DBGEN 


Input 


Debug Enable. This input signal allows the 
debug features of the ARM1020 to be 
disabled. This signal should be LOW only 
when debugging will not be required. 


EDBGRQ 


Input 


External Debug Request. When driven HIGH, 
this causes the processor to stop when in Halt 
mode. 


INSTREXEC 


Output 


Instruction Executed. Indicates that in the 
previous cycle the instruction in the execute 
stage of the pipeline passed its condition 
codes and was executed. NOT AVAILABLE 
IN REVO. 
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9.0 JTAG Signals 



Name I Direction Description 



SCREG[4:0] Output 



IR[3:0] Output TAP Controller Instruction Register* These four bits 

reflect the current instruction loaded into the TAP 
controller instruction register. The bits change on the 
falling edge of TCK when the state niachine is in the 
UPDATE-IR state. 

Scan Chain Register. These five bits reflect the ID 
number of the scan chain currently selected by the 
TAP controller. These bits change on the falling edge 
of TCK when the TAP state machine is in the 
UPDATE-DR state. 

Serial Data Out from an external or Boundary Scan 
cham. It should be set up on the falling edge of TCK. 
When an external boundary scan chain is not 
connected, this input should be tied LOW. 
TAP Controller State Machine. This bus reflects the 
current state of the TAP controller state machine. 
These bits change off the rising edge of TCK. 
Input The JTAG clock (the test clock). 
Input Test Data Input; the JTAG serial input. 
Output Test Data Output; the JTAG serial output; 
Output Not TDO Enable. When LOW, this signal denotes that 
serial data is being driven out on the TDO output, 
nTDOEN would normally be used as an output enable 
for a TDO pin in a packaged part. 
Input Test Mode Select. TMS selects to which state the TAP 

controller state machine should change. 
Input Not Test Reset. Active-low reset signal for the 

boundary scan logic. This pin must be pulsed or driven 
LOW after power up to achieve normal device 
operation, in addition to the normal device reset 
(nRESET). 



SDOUTBS Input 



TAPSM[3:0] Output 



TCK 

TDI 

TDO 

NTDOEN 



™s 

nTRST 



5 
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CLAIMS 

1 . Apparatus for processing data, said apparatus comprising: 

a main processor responsive to main processor instructions within a stream of 
5 instructions input to said main processor to perfonn main processor operations; 

a coprocessor coupled to said main processor via a coprocessor interface and 
responsive to coprocessor instructions within said stream of instructions to perfonn 
coprocessor operations; wherein 

said coprocessor is a debug coprocessor operable to at least partially control 
1 0 generation of diagnostic data for debugging said apparatus for processing data and 
said coprocessor instructions are debug coprocessor instructions that control operation 
of said debug coprocessor. 

2. Apparatus as claimed in claim 1 , M^erein said debug coprocessor comprises 
1 S one or more debug coprocessor registers. 

3 . Apparatus as claimed in claim 2, wherein said main processor comprises an 
instruction address bus for transmitting instruction addresses associated with 
instructions within said stream of instructions, said one or more debug coprocessor 

20 registers includes a breakpoint register for storing a breakpoint value and said debug 
coprocessor includes a breakpoint comparator for comparing said breakpoint value 
with said instruction addresses upon said instruction address bus and generating a 
breakpoint indication signal when said breakpoint value matches an instruction 
address upon said instruction address bus. 

25 

4. Apparatus as claimed in claim 3, wherein said one or more debug coprocessor 
registers includes a breakpoint control register associated with said breakpoint regist^ 
for storing a breakpoint control value specifing parameters that control operation of 
said breakpoint comparator. 

30 
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5. Apparatus as claimed in claim 4, wherein said parameters include at least one 
of: a mask value; a brealqwint enable flag; and a mode selecting value for controlling 
in which of a plurality of operational modes of said sqpparatus for data processing said 
breakpoint comparator is active. 

5 

6. Apparatus as claimed in any one of claims 2 to 5, wherein said main processor 
comprises an data address bus for transmittii^ data addresses associated with data 
values processed by said apparatus for processing data, said one or more debug 
coprocessor registers includes a watchpoint register for storing a watchpoint value and 

10 said debug coprocessor includes a watchpoint comparator for comparing said 

watchpoint value with said data addresses upon said data address bus and generating a 
watchpoint indication signal when said watchpoint value matches an data address 
upon said data address bus. 

15 7. Apparatus as claimed in claim 6, wherein said one or more debug coprocessor 
registers includes a watchpoint control register associated with said watchpoint 
register for storing a watchpoint control value specifing parameters that control 
operation of said watchpoint comparator. 

20 8. Apparatus as claimed in claim 7, wherein said parameters include at least one 
of: a mask value; a watchpoint enable flag; and a mode selecting value for controlling 
in which of a plurality of operational modes of said apparatus for data processing said 
watchpoint comparator is active. 

25 9. Apparatus as claimed in any one of claims 2 to 8, wherein at least one of said 
one or more debug coprocessor registers is accessed in response to a debug 
coprocessor regist» access instruction within said stream of instructions. 

10. Apparatus as claimed in any one of claims 2 to 9, wherein at least one of said 
30 one or more debug coprocessor registers is accessed via a serial scan chain, said serial 
scan ch^ operating under control of a scan chmn controller. 
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1 1 . Apparatus as claimed in claim 1 0, wherein said one or more debug 
coprocessor registers includes a debug data value register for storing a data value 
accessible to said main processor, said data value being accessed via said serial scan 

5 chain. 

12. Apparatus as claimed in claim 1 1 , wherein said debug data value register 
comprises a first debug data register that is writable from said serial scan chain and 
readable by said mam processor and a second debug data register that is readable by 

1 0 said serial scan chain and writable from said main processor. 

13. Apparatus as claimed in any one of claims 10, 1 1 and 12, wherein said one or 
more debug coprocessor registers includes a debug instruction register for storing a 
debug instruction for execution by said main processor, said debug instruction being 

1 S transferred to said debug instruction register via said serial scan chain. 

14. Apparatus as claimed in any one of claims 10 to 13, wherein said one or more 
debug coprocessor registers includes debug status control register, said debug said 
control register storing data specifying one or more of: which condition triggered entry 

20 into a debug mode; a flag enabling said debug coprocessor; and flags controlling main 
processor vector instmction traps. 

15. Apparatus as claimed in any one of the preceding claims, wherein said 
coprocessor interface includes one or more signal lines for transferring signals 

25 generated by said debug coprocessor during debugging operation including one or 
more of: a signal to trigger a hold in a main processor pipeline of said main processor; 
a signal to trigger a hold in a coprocessor pipeline of a further coprocessor coupled to 
said coprocessor bus; and a signal to trigger cancelling of a coprocessor operation in 
said further coprocessor. 



30 
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16. Apparatus as claimed in any one of claims 10 to 1 5, wherein said main 
processor and said debug coprocessor are driven by a common clock signal and said 
scan chain controller is driven by an asynchronous scan chain clock signal. 

5 17. A method of processing data, said mettiod comprising the steps of: 

in response to main processor instructions within a stream of instructions input 

to a main processor performing main processor operations; 

in response to coprocessor instructions within said stream of instructions 

controlling a coprocessor coupled to said main processor via a coprocessor interface 
1 0 and to perform coprocessor operations; wherein 

said coprocessor is a debug coprocessor operable to at least partially control 

generation of diagnostic data for debugging said apparatus for processing data and 

said coprocessor instructions are debug coprocessor instructions that control operation 

of said debug coprocessor. 

15 

18. An apparatus for processing data as claimed in claim 1 and substantially as 
hereinbefore described with reference to the accompanying drawings. 

19. A method of processing data as claimed in claim 17 and substantially as 
20 hereinbefore described with reference to the accompanying drawings. 
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